For matrices with some special structure, it is possible to derive versions of algorithms which are faster and/or more accurate than the standard algorithms.
The reader should be familiar with concepts of eigenvalues and eigen vectors, singular values and singular vectors, related perturbation theory, and algorithms.
The reader should be able to recognise matrices which have rank-revealing decomposition and apply adequate algorithms, and to apply forward stable algorithms to arrowhead and diagonal-plus-rank-one matrices.
For more details, see Z. Drmač, Computing Eigenvalues and Singular Values to High Relative Accuracy and J. Demmel et al, Computing the singular value decomposition with high relative accuracy and the references therein.
Let $A\in\mathbb{R}^{m\times n}$ with $\mathop{\mathrm{rank}}(A)=n$ (therefore, $m\geq n$) and $A=U\Sigma V^T$ its thin SVD.
Let $A\in\mathbb{R}^{m\times n}$.
The singular values of $A$ are (perfectly) well determined to high relative accuracy if changing any entry $A_{kl}$ to $\theta A_{kl}$, $\theta \neq 0$, causes perturbations in singular values bounded by
$$ \min\{|\theta|,1/|\theta|\}\sigma_j \leq\tilde \sigma_j \leq \max\{|\theta|,1/|\theta|\}\sigma_j,\quad \forall j. $$The sparsity pattern of $A$, $Struct(A)$, is the set of indices for which $A_{kl}$ is permitted to be non-zero.
The bipartite graph of the sparsity pattern $S$, $\mathcal{G}(S)$, is the graph with vertices partitioned into row vertices $r_1,\ldots,r_m$ and column vertices $c_1,\ldots,c_n$, where $r_k$ and $c_l$ are connected if and only if $(k,l)\in S$.
If $\mathcal{G}(S)$ is acyclic, matrices with sparsity pattern $S$ are biacyclic.
A decomposition $A=XDY^T$ with diagonal matrix $D$ is called a rank revealing decomposition (RRD) if $X$ and $Y$ are full-column rank well-conditioned matrices.
Hilbert matrix is a square matrix $H$ with elements $H_{ij}=\displaystyle\frac{1}{i+j-1}$.
Hankel matrix is a square matrix with constant elements along skew-diagonals.
Cauchy matrix is an $m\times n$ matrix $C$ with elements $C_{ij}=\displaystyle\frac{1}{x_i+y_j}$ with $x_i+y_j\neq 0$ for all $i,j$.
The singular values of $A$ are perfectly well determined to high relative accuracy if and only if the bipartite graph $\mathcal{G}(S)$ is acyclic (forest of trees). Examples are bidiagonal and arrowhead matrices. Sparsity pattern $S$ of acyclic bipartite graph allows at most $m+n-1$ nonzero entries. A bisection algorithm computes all singular values of biacyclic matrices to high relative accuracy.
An RRD of $A$ can be given or computed to high accuracy by some method. Typical methods are Gaussian elimination with complete pivoting or QR factorization with complete pivoting.
Let $\hat X \hat D \hat Y^T$ be the computed RRD of $A$ satisfying \begin{align*} |D_{jj}-\hat D_{jj}|&\leq O(\varepsilon)|D_{jj}|,\\ \| X-\hat X\|&\leq O(\varepsilon) \|X\|,\\ \| Y-\hat Y\|&\leq O(\varepsilon) \|Y\|. \end{align*} The following algorithm computes the EVD of $A$ with high relative accuracy:
Let $R=D'R'$, where $D'$ is such that the rows of $R'$ have unit norms. Then the following error bounds hold: $$ \frac{|\sigma_j-\tilde\sigma_j|}{\sigma_j}\leq O(\varepsilon \kappa(R')\cdot \max\{\kappa(X),\kappa(Y)\})\leq O(\varepsilon n^{3/2}\kappa(X)\cdot \max\{\kappa(X),\kappa(Y)\}). $$
Hilbert matrix is Hankel matrix and Cauchy matrix, it is symmetric positive definite and very ill-conditioned.
Every sumbatrix of a Cauchy matrix is itself a Cauchy matrix.
Determinant of a square Cauchy matrix is $$ \det(C)=\frac{\prod_{1\leq i<j\leq n}(x_j-x_i)(y_j-y_i)} {\prod_{1\leq i,j\leq n} (x_i+y_j)}. $$ It is computed with elementwise high relative accuracy.
Let $A$ be square and nonsingular and let $A=LDR$ be its decomposition with diagonal $D$, lower unit-triangular $L$, and upper unit-triangular $R$. The closed formulas using quotients of minors are (see A. S. Householder, The Theory of Matrices in Numerical Analysis):
Let $A=DA_S D$ be strongly scaled symmetric positive definite matrix. Then Cholesky factorization with complete (diagonal) pivoting is an RRD. Consider the following three step algorithm:
The Cholesky factorization with pivoting can be implemented very fast with block algorithm (see C. Lucas, LAPack-Style Codes for Level 2 and 3 Pivoted Cholesky Factorizations).
The eigenvalues $\tilde \lambda_j$ computed using the above algorithm satisfy relative error bounds: $$ \frac{|\lambda_j-\tilde\lambda_j|}{\lambda_j} \leq O(n\varepsilon \|A_S\|_2^{-1}). $$
In [1]:
include("./ModuleB.jl")
Out[1]:
In [2]:
using .ModuleB
In [3]:
using LinearAlgebra
In [4]:
n=20
import Random
Random.seed!(421)
B=randn(n,n)
# Scaled matrix
As=Matrix(Symmetric(B'*B))
# Scaling
D=exp.(50*(rand(n).-0.5))
# Parentheses are necessary!
A=[As[i,j]*(D[i]*D[j]) for i=1:n, j=1:n]
issymmetric(A), cond(As), cond(A)
Out[4]:
In [5]:
?cholesky;
We will not use the Cholesky factorization with complete pivoting. Instead, we will just sort the diagonal of $A$ in advance, which is sufficient for this example.
Write the function for Cholesky factorization with complete pivoting as an excercise.
In [6]:
?sortperm;
In [7]:
p=sortperm(diag(A), rev=true)
L=cholesky(A[p,p])
Out[7]:
In [8]:
U,σ,V=myJacobiR(Matrix(L.L));
In [9]:
methods(myJacobiR)
Out[9]:
In [10]:
λ=σ.^2
U₁=U[invperm(p),:]
λ
Out[10]:
In [11]:
U'*A[p,p]*U
Out[11]:
In [12]:
# Due to large condition number, this is not
# as accurate as expected
C=U₁'*A*U₁
Out[12]:
In [13]:
# Orthogonality
norm(U₁'*U₁-I)
Out[13]:
In [14]:
Dc=sqrt.(diag(C))
Cs=[C[i,j]/(Dc[i]*Dc[j]) for i=1:n, j=1:n]
Out[14]:
In [15]:
K=U₁*Diagonal(σ)
K'*K
Out[15]:
In [16]:
# Explain why is the residual so large.
norm(A*U₁-U₁*Diagonal(λ))
Out[16]:
In [17]:
# Relative residual is percfect
norm(A*U₁-U₁*Diagonal(λ))/norm(A)
Out[17]:
In [18]:
[λ sort(eigvals(A),rev=true)]
Out[18]:
We need the newest version of the package SpecialMatrices.jl.
In [19]:
# Pkg.checkout("SpecialMatrices")
using SpecialMatrices
In [20]:
varinfo(SpecialMatrices)
Out[20]:
In [21]:
C=Cauchy([1,2,3,4,5],[0,1,2,3,4])
Out[21]:
In [22]:
H=Hilbert(5)
Out[22]:
In [23]:
Hf=Matrix(H)
Out[23]:
In [24]:
# This is exact
det(Hf)
Out[24]:
In [25]:
# Exact formula for the determinant of a Cauchy matrix from Fact 7.
import LinearAlgebra.det
function det(C::Cauchy{T}) where T
n=length(C.x)
F=triu([(C.x[j]-C.x[i])*(C.y[j]-C.y[i]) for i=1:n, j=1:n],1)
num=prod(F[findall(!iszero,F)])
den=prod([(C.x[i]+C.y[j]) for i=1:n, j=1:n])
if all(isinteger,C.x)&all(isinteger,C.y)
return num//den
else
return num/den
end
end
Out[25]:
In [26]:
det(C)
Out[26]:
We now compute componentwise highly accurate $A=LDL ^T$ factorization of a Hilbert (Cauchy) matrix. Using Rational
numbers gives high accuracy.
In [27]:
# Exact LDLT factorization from Fact 8, no pivoting.
function myLDLT(C::Cauchy)
n=length(C.x)
T=typeof(C.x[1])
D=Array{Rational{T}}(undef,n)
L=Matrix{Rational{T}}(I,n,n)
δ=[det(Cauchy(C.x[1:j],C.y[1:j])) for j=1:n]
D[1]=map(Rational{T},C[1,1])
D[2:n]=δ[2:n]./δ[1:n-1]
for i=2:n
for j=1:i-1
L[i,j]=det(Cauchy( C.x[[1:j-1;i]], C.y[1:j])) / δ[j]
end
end
L,D
end
Out[27]:
In [28]:
L,D=myLDLT(C)
L
Out[28]:
In [29]:
D
Out[29]:
In [30]:
L*Diagonal(D)*L' # -Matrix(H)
Out[30]:
In [31]:
# L*D*L' is an RRD
cond(L)
Out[31]:
In [32]:
cond(C)
Out[32]:
We now compute the accurate EVD of the Hilbert matrix of order $n=100$. We cannot use the function myLDLT()
since the computation of determinant causes overflow and there is no pivoting. Instead, we use Algorithm 3 from
J. Demmel, Computing the singular value decomposition with high relative accuracy.
In [33]:
function myGECP(C::Cauchy)
n=length(C.x)
G=Matrix(C)
x=copy(C.x)
y=copy(C.y)
pr=collect(1:n)
pc=collect(1:n)
# Find the maximal element
for k=1:n-1
i,j=Tuple(argmax(abs.(G[k:n,k:n])))
i+=k-1
j+=k-1
if i!=k || j!=k
G[[i,k],:]=G[[k,i],:]
G[:, [j,k]]=G[:, [k,j]]
x[[k,i]]=x[[i,k]]
y[[k,j]]=y[[j,k]]
pr[[i,k]]=pr[[k,i]]
pc[[j,k]]=pc[[k,j]]
end
for r=k+1:n
for s=k+1:n
G[r,s]=G[r,s]*(x[r]-x[k])*(y[s]-y[k])/
((x[k]+y[s])*(x[r]+y[k]))
end
end
G=Matrix(Symmetric(G))
end
D=diag(G)
X=tril(G,-1)*Diagonal(1.0./D)+I
Y=Diagonal(1.0./D)*triu(G,1)+I
X,D,Y', pr,pc
end
Out[33]:
In [34]:
# First a smaller test
l=8
C=Cauchy(collect(1:l),collect(0:l-1))
Out[34]:
In [35]:
X,D,Y,pr,pc=myGECP(C)
Out[35]:
In [36]:
norm((X*Diagonal(D)*Y')-Matrix(C)[pr,pc])
Out[36]:
In [37]:
norm(X[invperm(pr),:]*Diagonal(D)*Y[invperm(pc),:]'-Matrix(C))
Out[37]:
In [38]:
# Now the big test.
n=100
H=Hilbert(n)
C=Cauchy(collect(1:n), collect(0:n-1))
Out[38]:
We need a function to compute RRD from myGECP()
In [39]:
function myRRD(C::Cauchy)
X,D,Y,pr,pc=myGECP(C)
X[invperm(pr),:], D, Y[invperm(pc),:]
end
Out[39]:
In [40]:
X,D,Y=myRRD(C);
In [41]:
# Check
norm((X*Diagonal(D)*Y')-Matrix(C))
Out[41]:
In [42]:
cond(C)
Out[42]:
In [43]:
# Is this RRD? here X=Y
cond(X), cond(Y)
Out[43]:
In [44]:
# Algorithm from Fact 3
function myRRDSVD(X,D,Y)
Q,R,p=qr(X*Diagonal(D),Val(true))
W=R[:,p]*Y'
V,σ,U₁=myJacobiR(W')
U=Q*U₁
U,σ,V
end
Out[44]:
In [45]:
U,σ,V=myRRDSVD(X,D,Y);
In [46]:
# Check residual and orthogonality
norm(Matrix(C)*V-U*Diagonal(σ)), norm(U'*U-I), norm(V'*V-I)
Out[46]:
In [47]:
# Observe the difference!!
[sort(σ) sort(svdvals(C)) sort(eigvals(Matrix(C)))]
Out[47]:
In [48]:
# Plot the eigenvalues (singular values) and left singular vectors
using Plots
In [49]:
plot(σ,yscale = :log10)
Out[49]:
In [52]:
# using SparseArrays
# spy(abs.(sparse(U)))
In [50]:
# Better spy
# some options :bluesreds,clim=(-1.0,1.0)
import Plots.spy
spy(A)=heatmap(A, yflip=true, color=:bluesreds, aspectratio=1)
Out[50]:
In [51]:
spy(U)
Out[51]:
For more details, see N. Jakovčević Stor, I. Slapničar and J. Barlow, Accurate eigenvalue decomposition of real symmetric arrowhead matrices and applications and N. Jakovčević Stor, I. Slapničar and J. Barlow, Forward stable eigenvalue decomposition of rank-one modifications of diagonal matrices.
An arrowhead matrix is a real symmetric matrix of order $n$ of the form $A=\begin{bmatrix} D & z \\ z^{T} & \alpha \end{bmatrix}$, where $D=\mathop{\mathrm{diag}}(d_{1},d_{2},\ldots ,d_{n-1})$, $z=\begin{bmatrix} \zeta _{1} & \zeta _{2} & \cdots & \zeta _{n-1} \end{bmatrix}^T$ is a vector, and $\alpha$ is a scalar.
An arrowhead matrix is irreducible if $\zeta _{i}\neq 0$ for all $i$ and $d_{i}\neq d_{j}$ for all $i\neq j$.
A diagonal-plus-rank-one matrix (DPR1 matrix) is a real symmetric matrix of order $n$ of the form $A= D +\rho z z^T$, where $D=\mathop{\mathrm{diag}}(d_{1},d_{2},\ldots ,d_{n})$, $z=\begin{bmatrix} \zeta _{1} & \zeta _{2} & \cdots & \zeta _{n} \end{bmatrix}^T$ is a vector, and $\rho \neq 0$ is a scalar.
A DPR1 matrix is irreducible if $\zeta _{i}\neq 0$ for all $i$ and $d_{i}\neq d_{j}$ for all $i\neq j$.
Let $A$ be an arrowhead matrix of order $n$ and let $A=U\Lambda U^T$ be its EVD.
If $d_i$ and $\lambda_i$ are nonincreasingy ordered, the Cauchy Interlace Theorem implies $$\lambda _{1}\geq d_{1}\geq \lambda _{2}\geq d_{2}\geq \cdots \geq d_{n-2}\geq\lambda _{n-1}\geq d_{n-1}\geq \lambda _{n}. $$
If $\zeta _{i}=0$ for some $i$, then $d_{i}$ is an eigenvalue whose corresponding eigenvector is the $i$-th unit vector, and we can reduce the size of the problem by deleting the $i$-th row and column of the matrix. If $d_{i}=d_{j}$, then $d_{i}$ is an eigenvalue of $A$ (this follows from the interlacing property) and we can reduce the size of the problem by annihilating $\zeta_j$ with a Givens rotation in the $(i,j)$-plane.
If $A$ is irreducible, the interlacing property holds with strict inequalities.
The eigenvalues of $A$ are the zeros of the Pick function (also, secular equation) $$ f(\lambda )=\alpha -\lambda -\sum_{i=1}^{n-1}\frac{\zeta _{i}^{2}}{% d_{i}-\lambda }=\alpha -\lambda -z^{T}(D-\lambda I)^{-1}z, $$ and the corresponding eigenvectors are $$ U_{:,i}=\frac{x_{i}}{\left\Vert x_{i}\right\Vert _{2}},\quad x_{i}=\begin{bmatrix} \left( D-\lambda _{i}I\right) ^{-1}z \\ -1% \end{bmatrix}, \quad i=1,\ldots ,n. $$
Let $A$ be irreducible and nonsingular. If $d_i\neq 0$ for all $i$, then $A^{-1}$ is a DPR1 matrix $$ A^{-1}=\begin{bmatrix} D^{-1} & \\ & 0 \end{bmatrix} + \rho uu^{T}, $$ where $u=\begin{bmatrix} D^{-1}z \\ -1 \end{bmatrix}$, and $\rho =\displaystyle\frac{1}{\alpha-z^{T}D^{-1}z}$. If $d_i=0$, then $A^{-1}$ is a permuted arrowhead matrix, $$ A^{-1}\equiv \begin{bmatrix} D_{1} & 0 & 0 & z_{1} \\ 0 & 0 & 0 & \zeta _{i} \\ 0 & 0 & D_{2} & z_{2} \\ z_{1}^{T} & \zeta _{i} & z_{2}^{T} & \alpha \end{bmatrix}^{-1} = \begin{bmatrix} D_{1}^{-1} & w_{1} & 0 & 0 \\ w_{1}^{T} & b & w_{2}^{T} & 1/\zeta _{i} \\ 0 & w_{2} & D_{2}^{-1} & 0 \\ 0 & 1/\zeta _{i} & 0 & 0 \end{bmatrix}, $$ where \begin{align*} w_{1}&=-D_{1}^{-1}z_{1}\displaystyle\frac{1}{\zeta _{i}},\\ w_{2}&=-D_{2}^{-1}z_{2}\displaystyle\frac{1}{\zeta _{i}},\\ b&= \displaystyle\frac{1}{\zeta _{i}^{2}}\left(-\alpha +z_{1}^{T}D_{1}^{-1}z_{1}+z_{2}^{T}D_{2}^{-1}z_{2}\right). \end{align*}
The algorithm based on the following approach computes all eigenvalues and all components of the corresponding eigenvectors in a forward stable manner to almost full accuracy in $O(n)$ operations per eigenpair:
The algorithm is implemented in the package Arrowhead.jl. In certain cases, $b$ or $\rho$ need to be computed with extended precision. For this, we use the functions from file DoubleDouble.jl, originally from the the package DoubleDouble.jl.
In [57]:
using Arrowhead
a=2.0
b=3.0
Out[57]:
In [58]:
sqrt(a),sqrt(3.0)
Out[58]:
In [59]:
sqrt(BigFloat(a)),sqrt(BigFloat(3.0))
Out[59]:
In [60]:
# Doible numbers according to Dekker, 1971
ad=Arrowhead.Double(a);bd=Arrowhead.Double(b)
Out[60]:
In [61]:
ra=sqrt(ad)
Out[61]:
In [62]:
rb=sqrt(bd)
Out[62]:
In [63]:
BigFloat(rb.hi)+BigFloat(rb.lo)
Out[63]:
In [64]:
sqrt(BigFloat(a))*sqrt(BigFloat(3.0))
Out[64]:
In [65]:
p=ra*rb
Out[65]:
In [66]:
BigFloat(p.hi)+BigFloat(p.lo)
Out[66]:
In [67]:
# pkg> add Arrowhead#master
using Arrowhead
In [68]:
varinfo(Arrowhead)
Out[68]:
In [69]:
methods(GenSymArrow)
Out[69]:
In [70]:
n=10
A=GenSymArrow(n,n)
Out[70]:
In [71]:
# Elements of the type SymArrow
A.D, A.z, A.a, A.i
Out[71]:
In [72]:
(λ,U),info=eigen(A)
norm(A*U-U*Diagonal(λ)), norm(U'*U-I)
Out[72]:
In [73]:
# Timings - notice the O(n^2)
@time eigen(GenSymArrow(1000,1000));
@time eigen(GenSymArrow(2000,2000));
In [74]:
A=SymArrow( [ 1e10+1.0/3.0, 4.0, 3.0, 2.0, 1.0 ],
[ 1e10 - 1.0/3.0, 1.0, 1.0, 1.0, 1.0 ], 1e10, 6 )
Out[74]:
In [75]:
(λ,U),info=eigen(A)
[sort(λ) sort(eigvals(Matrix(A))) sort(λ)-sort(eigvals(Matrix(A)))]
Out[75]:
The properties of DPR1 matrices are very similar to those of arrowhead matrices. Let $A$ be a DPR1 matrix of order $n$ and let $A=U\Lambda U^T$ be its EVD.
If $d_i$ and $\lambda_i$ are nonincreasingy ordered and $\rho>0$, then $$\lambda _{1}\geq d_{1}\geq \lambda _{2}\geq d_{2}\geq \cdots \geq d_{n-2}\geq\lambda _{n-1}\geq d_{n-1}\geq \lambda _{n}\geq d_n. $$ If $A$ is irreducible, the inequalities are strict.
Facts 2 on arrowhead matrices holds.
The eigenvalues of $A$ are the zeros of the secular equation $$ f(\lambda )=1+\rho\sum_{i=1}^{n}\frac{\zeta _{i}^{2}}{d_{i}-\lambda } =1 +\rho z^{T}(D-\lambda I)^{-1}z=0, $$ and the corresponding eigenvectors are $$ U_{:,i}=\frac{x_{i}}{\left\Vert x_{i}\right\Vert _{2}},\quad x_{i}=( D-\lambda _{i}I) ^{-1}z. $$
Let $A$ be irreducible and nonsingular. If $d_i\neq 0$ for all $i$, then $$ A^{-1}=D^{-1} +\gamma uu^{T},\quad u=D^{-1}z, \quad \gamma =-\frac{\rho}{1+\rho z^{T}D^{-1}z}, $$ is also a DPR1 matrix. If $d_i=0$, then $A^{-1}$ is a permuted arrowhead matrix, $$ A^{-1}\equiv \left(\begin{bmatrix} D_{1} & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & D_{2} \end{bmatrix} +\rho \begin{bmatrix} z_{1} \\ \zeta _{i} \\ z_{2} \end{bmatrix} \begin{bmatrix} z_{1}^{T} & \zeta _{i} & z_{2}^{T} \end{bmatrix}\right)^{-1}= \begin{bmatrix} D_{1}^{-1} & w_{1} & 0 \\ w_{1}^{T} & b & w_{2}^{T} \\ 0 & w_{2} & D_{2}^{-1} \end{bmatrix}, $$ where \begin{align*} w_{1}&=-D_{1}^{-1}z_{1}\displaystyle\frac{1}{\zeta _{i}},\\ w_{2}&=-D_{2}^{-1}z_{2}\displaystyle\frac{1}{\zeta _{i}},\\ b &=\displaystyle\frac{1}{\zeta _{i}^{2}}\left( \frac{1}{\rho}+z_{1}^{T}D_{1}^{-1}z_{1}+z_{2}^{T}D_{2}^{-1}z_{2}\right). \end{align*}
The algorithm based on the same approach as above, computes all eigenvalues and all components of the corresponding eigenvectors in a forward stable manner to almost full accuracy in $O(n)$ operations per eigenpair.
The algorithm is implemented in the package Arrowhead.jl
. In certain cases, $b$ or $\gamma$ need to be computed with extended precision.
In [76]:
n=10
A=GenSymDPR1(n)
Out[76]:
In [77]:
# Elements of the type SymDPR1
A.D, A.u, A.r
Out[77]:
In [78]:
(λ,U),info=eigen(A)
norm(A*U-U*Diagonal(λ)), norm(U'*U-I)
Out[78]:
In [79]:
A=SymDPR1( [ 10.0/3.0, 2.0+1e-7, 2.0-1e-7, 1.0 ], [ 2.0, 1e-7, 1e-7, 2.0], 1.0 )
A=SymDPR1( [ 1e10, 5.0, 4e-3, 0.0, -4e-3,-5.0 ], [ 1e10, 1.0, 1.0, 1e-7, 1.0,1.0 ], 1.0 )
Out[79]:
In [80]:
(λ,U),info=eigen(A)
println([sort(λ) sort(eigvals(Matrix(A)))])
norm(A*U-U*Diagonal(λ)), norm(U'*U-I)
Out[80]: